Visualizing Categorical Data

  • Graphical methods for categorical data are not well developed in comparison with what is available for numeric variables.

  • Hierarchical structure of the counts and proportions creates a subtle complexity.

  • There are three types of distributions: joint, marginal, and conditional

  • Mosaic plots provide one possibility of visualizing multidimensional data and can be a powerful and easy option.

Published by the New York Times

Mosaic Plots

  • map the proportions of a distribution to the areas of a graphic
  • disjoint partitioning of a rectangular area
  • constructed by dividing a square into smaller rectangles recursively, into horizontal and vertical directions in turns

Example

Creation of ggmosaic

  • Version 2.0.0 of ggplot2 introduced a way for other R packages to implement custom geoms.

  • ggmosaic was created primarily using ggproto and the productplots package

  • ggmosaic began as a geom extension of the rect geom

  • used the data handling provided in the productplots package

  • calculates xmin, xmax, ymin, and ymax for the rect geom to plot

Why Mosaic Plots?

Each one of the disjoint segments of the rightmost mosaic plot has area proportional to the corresponding joint probability.

GeomMosaic

  • Easy customization
  • Facetting
  • Ease of Use
  • Versatile

Translating GeomMosaic for ggplotly()

  • The plotly package contains the infrastructure to provide translations of custom geoms to plotly

  • GeomMosaic can be reduced to the lower-level geom GeomRect

  • allowed us to write a method for the to_basic() generic function in plotly.

Interactive Mosaic Plots

Examples

Shiny

Conclusion

People have a natural tendency to compare shapes by area, and we can leverage this tendency to depict statistical distributions via mosaic plots.

Mosaic plots can be implemented easily with the implementation of GeomMosaic into ggplot2